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Abstract 

Many physical theories like chaos theory are fundamentally concerned with the conceptual tension 
between determinism and randomness. Rolmogorov complexity can express randomness in determinism 
and gives an approach to formulate chaotic behavior. 

1 Introduction 

Ideally, physical theories are abstract representations — mathematical axiomatic theories for the undcrlying 
physical reality. This reality cannot be directly experienced, and is therefore unknown and even in princi- 
plc unknowable. Instead, scientists postulate an informal description which is intuitively acceptable, and 
subsequently formulate one or more mathematical theories to dcscribc thc phcnomcna. 

Deterministic Chaos: Many phenomena in physics (like the weather) satisfy well accepted deterministic 
equations. From initial data we can extrapolate and compute the next states of the system. Traditionally 
it was thought that increased precision of the initial data (measurement) and increased computing power 
would result in increasingly accurate extrapolation (prediction) . Unfortunately it turns out that for many 
systems this is not the case. In fact, it turns out that any long range prediction with any confidence better 
than what we would get by flipping a fair coin is practically impossible: this phenomenon is known as chaos 
(see [3] for an introduction). There are two, more or less related, causes for this: 

Instability In certain deterministic systems, an arbitrary small error in initial conditions can exponentially 
increase during the subsequent evolution of the system, until it encompasses the full range of values 
achievable by the system. This phenomenon of instability of a computation is in fact well known 
in numerical analysis: computational procedures inverting ill-conditioned matrices (with determinant 
about zero) will introduce exponentially increasing errors. 

Unpredictability Assume we deal with a system described by deterministic equations which can be finitely 
represented (like a recursive function) . Even if fixed-length initial segments of the infinite binary rep- 
resentation of the real parameters describing past states of the system are perfectly known, and the 
computational procedure used is perfectly error free, for many such systems it will still be impossible 
to effectively predict (compute) any significantly long extrapolation of system states with any confi- 
dence higher than using a random coin flip. This is the core of chaotic phenomena: randomness in 
determinism. 

Probability: Classical probability theory deals with randomness in the sense of random variables. Thc 
concept of random individual data cannot be expressed. Yet our intuition about the latter is very strong: 
An adversary claims to have a true random coin and invites us to bet on the outcome. The coin produces 
a hundred heads in a row. We say that the coin cannot have been fair. The adversary, however, appeals to 
probability theory which says that each sequence of outcomes of a hundred coin flips is equally likely, 1/2 100 , 
and one sequence had to come up. Probability theory gives us no basis to challenge an outcome after it has 
happened. We could only exclude unfairness in advance by putting a penalty side-bet on an outcome of 100 
heads. But what about 1010 . . .? What about an initial segment of the binary expansion of 7r? 

*This paper is based on a talk by the author at thc Univcrsitv of Waterloo, Canada, in 1991. Partialh/ supported by EU 
through NcuroColt II Working Group and the QAIP Project. Address: Centrum voor Wiskunde en Informatica, Kruislaan 
413, 1098 SJ Amsterdam, The Netherlands. Email: paulv@cwi.nl 
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Random sequence 



Pr(lOOlOOllOllOOOlllOllOlOOOO) 




The first sequence is regular, but what is the distinction of the second sequence and the third? Thc third 
sequence was generated by flipping a quarter. The second sequence is very regular: 0, 1, 00, 01, . . .. Thc third 
sequence will pass (pseudo) randomness tests. 

In fact, classical probability theory cannot express the notion of randomness of an individual sequence. 
It can only express expectation of properties of the total set of sequences under some distribution. 

This is analogous to the situation in physics above. How can 'an individual object be random?' is as 
much a probability theory paradox as 'how can an individual sequence of states of a dctcrministic system be 
random?' is a paradox of deterministic physical systems. 

In probability theory the problcm has found a satisfactory resolution by combining notions of computabil- 
ity and information theory to express the complexity of a finite object. This complexity is thc length of 
the shortest binary program from which the object can be effectively reconstructed. It may be called thc 
algorithmic information content of the object. This quantity turns out to be an attribute of the object alone, 
and recursively invariant. It is the Rolmogorov complexitu of the object. It turns out that this notion can 
be brought to bear on thc physical riddles too. 

2 Kolmogorov Complexitv 

To makc this paper self-contained we briefly review notions and properties required. For details and further 
properties see the textbook [12]. We identify the natural numbers Áf and the finite binary sequences as 



where e is the empty sequence. The length l(x) of a natural number x is the number of bits in the corre- 
sponding binary sequencc. For instance, l(e) = 0. If A is a set, then \A\ denotes the cardinalitu of A. Let 
(.) : Áí x Af — > J\í dcnote a standard computable bijective 'pairing' function. Throughout this paper, we will 
assume that (x,y) = l l{x) Qxy. 
Define (x, y, z) by (x, (y, z)). 

We need some notions from the theory of algorithms, see [15]. Let (j>\, fa, ■ ■ ■ be a standard enumeration 
of the partial recursive functions. The (Kolmogorov) complexitu of x G Af, given y, is defined as 



This means that C(x\y) is the minimal number of bits in a description from which x can be effectively 
reconstructed, given y. The unconditional complexity is defined as C(x) — C(x\e). 
An alternativc definition is as follows. Let 



be the conditional complcxity of x given y with reference to decoding function ip. Then C(x\y) = C^(x\y) 
for a universal partial recursive function ijj that satisfies ip>((y, n, z)) = (p n ((y, z)). 



(0,e), (1,0), (2,1), (3, 00), (4, 01),... 



C(x\y) = min{l((n, z)) : <j) n ((y, z)) = x}. 



C^(x\y) 



mh\{l(z) : ip((y,z)) = x} 



(1) 
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We will also make use of the prefix complexity K(x), which denotes the shortest self-delimiting descrip- 
tion. To this end, we consider so called prefix Turing machines, which have only O's and l's on their input 
tape, and thus cannot detect the end of the input. Instead we define an input as that part of the input tape 
which the machine has read when it halts. When x ^ y are two such input, we clearly have that x cannot 
be a prefix of y, and hence the set of inputs forms what is called a prefix code. We define K(x) similarly 
as above, with rcfcrence to a universal prefix machine that first reads 1"0 from the input tape and then 
simulates prefix machine n on the rest of the input. 

We need the following properties. Throughout 'log' denotes the binary logarithm. We often use 0(f(n)) = 
—0(f(n)), so that 0(f(n)) may denote a negative quantity. For each x,y E N we have 

C(x\y)<l(x) + 0(1). (2) 

For each y G J\f there is an x e N of length n such that C(x\y) > n. In particular, we can set y = e. Such 
x's may be called random, since they are without regularities that can be used to compress the description. 
Intuitivcly, the shortest effective description of x is x itself. In general, for each n and y, there are at least 
2™ - 2 n - c + 1 distinct x's of length n with 

C(x\y) > n - c. (3) 

In some cases we want to encode x in self-delimiting form x' , in order to be able to decompose x'y into 
x and y. Good upper bounds on the prefix complcxity of x are obtaincd by iterating the simple rule that a 
self-dclimiting (s.d.) description of the length of x followed by x itself is a s.d. description of x. For example, 
x ' = l l í x )0x and x" = l l( - l(x ^0l(x)x are both s.d. descriptions for x, and this shows that K(x) < 2l(x) + 0(l) 
and K(x) < l(x) + 2l(l(x)) + 0(1). 

Similarly, we can encode x in a self-dclimiting form of its shortest program p(x) (l(p(x)) = C(x)) in 
2C(x) + 1 bits. Iterating this scheme, we can encode x as a selfdelimiting program of C(x) + 21ogC(x) + 1 
bits, which shows that K(x) < C(x) + 21ogC(x) + 1, and so on. 

The string sqi has length at most n — S(n) — 0(1) and can be padded 

2.1 Random Sequences 

We would like to call an infinite sequence w G {0, 1}°° random if C(o>i :n ) >n — 0(1) for all n. It turns out 
that such sequences do not exist. This occasioned the celebrated theory of randomncss of P. Martin-Lóf, 
[14]. Latcr it turned out, [1], that we can yet precisely define the Martin-L6f random scquences, but using 
prefix Fvolmogorov complexity. we need. 

Theorem 1 An infinite binary sequence lú is random in the sense of Martin-Lóf iff there is an n a such that 
K(íVi :n ) > n for all n > n , 

That lú is random in Martin-Lof's sense means that it will pass all effective tests for randomncss: both 
the tests which are known now and the ones which are as yet unknown [14]. 

Similar properties hold for high-complexity finite strings, although in a less absolute sense. 

For every finite set S C {0,1}* containing x we have K(x\S) < log | S^l + 0(1). Indeed, consider the 
sclfdclimiting codc of x consisting of its [log \S\~\ bit long index of x in the lexicographical ordering of S. 
This code is called data-to-model code. The lack of typicality of x with respect to S is the amount by which 
K(x\S) falls short of the length of the data-to-model code. The randomness deficiency of x in S is dcfincd 

by 

5(x\S)=log\S\-K(x\S), (4) 

for x <E S, and oo otherwise. If S(x\S) is small, then x may be considered as a typical member of S. There 
are no simple special properties that singlc it out from the majority of elements in S. This is not just 
terminology: If 5(x\S) is small, then x satisfies all properties of low Kolmogorov complexity that hold with 
high probability for the elemcnts of S. For examplc: Considcr strings x of length n and let S = {0, 1}™ be 
a set of such strings. Then 5(x\S) = n — K(x\ni ± 0(1). 

(i) If P is a property satisfied by all x with 5(x\S) < 5(n), then P holds with probability at least 1 — 1/2 S ^ 
for the elements of S. 

(ii) Lct P bc any property that holds with probability at least 1 — l/2 s(n ^ for the elements of S. Then, 
every such P holds simultaneously for every x € S with 5(x\S) < 5(n) — K(P\n) — 0(1). 
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3 Algorithmic Chaos Theory 



For convenience assume that time is discrete: j\f. In a deterministic svstcm X the state of the svstem at 
time t is X t . The orbit of the svstem is the sequence of subsequent states Xq, X\, Xi, .... For convenience 
we assume the states are elements of {0, 1}. The definitions below are easily generalizcd. For each system, 
be it deterministic or random, we associate a measure /x with the space {0, 1}°° of orbits. That is, fi(x) is 
the probability that an orbit starts with x G {0, 1}*. 

Given an initial segment X 0:t of the orbit we want to compute X t+1 . Even if it would not be possible to 
computc X t+ i, we would like to compute a prediction of it which does better than a random coin flip. 

Definition 1 Let the set of orbits be S — {0, 1}°° with the Lebesgue measure A. Let 4> be a partial recursive 
function and let weS. Define 

Íf (f)((Jl;i-l) = UJi 



f 1 if 

\ o 



1 " otherwisc 

A dctcrministic system is chaotic if, for every computable function (f>, we have 

í-i 



lim d = — , 

i=0 



with probability 1. 



A well-known example of a chaotic system is the doubling map, [5] . Consider the deterministic system X 
with initial state X = O.ui a real number in thc interval [0, 1] where u> <E S is the binary representation. 

X n+1 = 2X n (mod 1) (5) 

where (mod 1) means drop the integer part. Thus, all iterates of equation 5 lie in the unit interval [0, 1]. 
In physics, this is called the 'energy surface' of the orbit. We can partition this energy surface into two cells, 
a left cell [0, and a right cell [5,1]. Thus X n lies in the left cell if and only if the nth digit of u> is 0. 

One way to derive the doubling map is as follows: In chaos theory, [3], people have for years being 
studying the discrete logistic equation 

Y n+1 =aY n (í-Y n ) 

which maps the unit interval upon itself when < a < 4. Whcn a = 4, sctting Y n = sin 2 irX n , we obtain: 

X n+1 = 2X n (mod 1). 

Lemma 1 There are a chaotic svstems (like X and Y above). 

Proof. We prove that X is a chaotic system. Since Y reduces to X by specialization,this shows that 
Y is chaotic as well. Assumc u> is random. Thcn by Theorem 1, 

C(uj 1:n )>n-2\ogn + 0(í). (6) 

Let (/) be any partial recursive function. Construct ( from <f> and u> as in Definition 2. 
Assume by way of contradiction that there is an e > such that 

1 " 1 

|- lim ^^--1 >£• 

i=l 

Then, there is a <5 > such that 

l im £í_^l < (1 _ S ). (7) 

n^oo rt 

We prove this as follows. The number of binary sequences of length n where the numbers of 0's and l's 
differ by at least an en is 

n 

N = 2-2 n Y, b (n,m,-) 

m— (i+e)n 



4 



where b(n, m,p) is the probability of m successes out of n trials in a (p, 1 —p) Bernoulli process: the Binomial 
distribution. A general estimate of the tail probability of the binomial distribution, with m the number of 
successful outcomes in n experiments with probability of succcss < p < 1 and q = 1 — p, is given by 
Chernoff's bounds, [4, 2], 

Pr(|m - np\ > en) < 2 e -( e ") 2 / 3 ™. (8) 

Thcrefore, we can describe any element £i:n concerned by giving n and en in 21ogn + 41oglogn bits selfde- 
limiting dcscriptions, and pointing out the string concerned in a constrained ensemble of at most N elements 
in logTV bits. Therefore, 

C(Ci:n) < n-e 2 nloge + 21ogn + 41oglogn + 0(l). 

That is, we can choose 

. 2 , 21ogn + 41oglogn + 0(1) 
o = e log e H . 

n 

Next, given £ and <f> we can reconstruct u> as follows: 

f or i := 1, 2, . . . do : 
if <j)(tjj\:i-i) = a and Q = then u>i := ->a 
else u>i := a. 

Therefore, 

C(u, 1:n )<C(C 1:n )+K(<l>)+0(l). (9) 

Now Equations 6, 7, 9 give the desired contradiction. By Theorem 1, the set of ui's satisfying Equation 6 
has uniform measure one, which proves the lemma. □ 

In [5] the argument is as follows. Assuming that the initial state is randomly drawn from [0, 1) according 
to the uniform measure A, we can use complexity arguments to show that the doubling map's observable 
orbit cannot be predicted better than a coin toss. Namely, with A-probability 1 the drawn initial state will be 
a Martin-Lóf random infinite sequence. Such sequences by definition cannot be effectively predicted better 
than a random coin toss, see [14]. 

But in this case we do not need to go to such trouble. The observed orbit essentially consists of the 
consecutive bits of the initial state.uniform measure is isomorphic to flipping a fair coin to generate it. This 
raises the challcnging problem of a mcaningful application of Kolmogorov complexity to chaos problems. 

From a practical viewpoint it may be argued that we really are not interested in infinite sequences: in 
practice the input will always be finite precision. Now an infmitc scqucncc which is random may still have 
an arbitrary long finite initial segmcnt which is completely regular. Therefore, we analyse the theory for 
finite precision inputs in the following section. 

3.1 Chaos with Finite Precision Input 

In the case of infinite precision real inputs, the distinction between chaotic and non-chaotic systems can be 
precisely drawn. In the case of finite precision inputs the distinction is necessarily a matter of degree. This 
occasions the following defmition. 

Definition 2 Lct S,\,4>,u> and ( be as in Definition 1. A deterministic system with input precision n is 
(e, S)-chaotic if, for every computable function <j>, we have 

n 

iE^-^ e ' 

í=i 1 

with probability at least 1—5. 

So systems are chaotic in the sense of Definition 1, like the doubling map above, iff they are (0, 0)-chaotic 
with precision oo. The system is probablv approximatelv inpredictable: a paz-chaotic system. 
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Theorem 2 Systems X and Y above are (y/(5(n) + O(l)) ln2/n, 1/2 S ^ -chaotic for every function 5 such 
that < S(n) < n . 

Proof. We prove that X is (e, <5)-chaotic. Since Y reduces to X, this implies that Y is (e, (S)-chaotic as 
well. 

Assume that x is a binary string of length n with 

C(x)>n-5(n). (10) 
Let <f> be a polvnomial time computable function, and define z by: 

1 Íf 4>(x\;i-l) = Xi 

otherwise 

Then, x can be reconstructed from z and <f> as before, and therefore: 

C(x)<C(z) + K(<f>) + 0(l). 

By Equation 10 this means 

C(z)>n-5(n)-K(<f>) + 0(l). (11) 

We analyse the number of zeros and ones in z. Using Chernoff's bounds, Equation 8, with p = q = 5, 
the number of z's which have an excess of en of ones over zeros is: 

N < 2 n+1 e~^ nS>2 l n 

with 

#ones(x) — — | < en. 

Then, we can give an effective description of z by giving a description of <ft, S and z's index in the set of 
size N in this many bits 

n - e 2 n\oge + K(cj>) + K(5) + 2 \og K ((f>) K (5) + 0(1). (12) 

From Equations 11, 12 we find 

£ ^ y/5(n) + 2K(<f>) +K(5) + 2 log K(<j>)K(S) + Q(T) 

n log e 

Making the simplifying assumption that K(<f>),K(5) = 0(1) this yields 



\#ones(z) - -| < v/^M+OU^nh^ (14) 

The number of binary strings x of length n with C(x) < n — 5(n) is at most 2 n ~ s ^ — 1 (there are not 
more programs of lcngth less than n — 5(n)). Therefore, the uniform probability of a real number starting 
with an n-length initial segment x such that C(x) > n — 5(n) is given by: 

A{w:C(wi :n >n-í(n)}>l-^. (15) 

Therefore, system X is (e, 5) chaotic with e = y/(5(n) + 0(1)) ln 2/n and <5 = l/2 á ("). 
□ 
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